{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Training with multiple GPUs from scratch\n",
    "\n",
    "This tutorials shows how we can increase performance by distributing train across multiple GPUs.\n",
    "So, as you might expect, running this tutorial requires at least 2 GPUs. \n",
    "And these days multi-GPU machines are actually quite common. \n",
    "The following figure depicts 4 GPUs on a single machine and connected to the CPU through a PCIe switch. \n",
    "\n",
    "![](../img/multi-gpu.svg)\n",
    "\n",
    "If a NVIDIA driver is installed on our machine,\n",
    "then we can check how many GPUs are available by running the command `nvidia-smi`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Sun Jul 23 09:01:41 2017       \n",
      "+-----------------------------------------------------------------------------+\n",
      "| NVIDIA-SMI 375.66                 Driver Version: 375.66                    |\n",
      "|-------------------------------+----------------------+----------------------+\n",
      "| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |\n",
      "| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |\n",
      "|===============================+======================+======================|\n",
      "|   0  GRID K520           Off  | 0000:00:03.0     Off |                  N/A |\n",
      "| N/A   42C    P0    42W / 125W |      0MiB /  4036MiB |      0%      Default |\n",
      "+-------------------------------+----------------------+----------------------+\n",
      "|   1  GRID K520           Off  | 0000:00:04.0     Off |                  N/A |\n",
      "| N/A   34C    P0    41W / 125W |      0MiB /  4036MiB |      0%      Default |\n",
      "+-------------------------------+----------------------+----------------------+\n",
      "|   2  GRID K520           Off  | 0000:00:05.0     Off |                  N/A |\n",
      "| N/A   32C    P0    42W / 125W |      0MiB /  4036MiB |      0%      Default |\n",
      "+-------------------------------+----------------------+----------------------+\n",
      "|   3  GRID K520           Off  | 0000:00:06.0     Off |                  N/A |\n",
      "| N/A   28C    P0    37W / 125W |      0MiB /  4036MiB |      0%      Default |\n",
      "+-------------------------------+----------------------+----------------------+\n",
      "                                                                               \n",
      "+-----------------------------------------------------------------------------+\n",
      "| Processes:                                                       GPU Memory |\n",
      "|  GPU       PID  Type  Process name                               Usage      |\n",
      "|=============================================================================|\n",
      "|  No running processes found                                                 |\n",
      "+-----------------------------------------------------------------------------+\n"
     ]
    }
   ],
   "source": [
    "!nvidia-smi"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We want to use all of the GPUs on together for the purpose of significantly speeding up training (in terms of wall clock). \n",
    "Remember that CPUs and GPUs each can have multiple cores. \n",
    "CPUs on a laptop might have 2 or 4 cores, and on a server might have up to 16 or 32 cores. \n",
    "GPUs tend to have many more cores -ban NVIDIA K80 GPU has 4992 - but run at slower clock speeds. \n",
    "Exploiting the parallism across the GPU cores is how GPUs get their speed advantage in the first place. \n",
    "\n",
    "As compared to the single CPU or single GPU setting where all the cores are typically used by default,\n",
    "parallelism across devices is a little more complicated.\n",
    "That's because most layers of a neural network can only run on a single devince. \n",
    "So, in order to parallelize across devices, we need to do a little extra\n",
    "Therefore, we need to do some additional work to partition a workload across multiple GPUs. \n",
    "This can be done in a few ways.\n",
    "\n",
    "## Data Parallelism\n",
    "\n",
    "For deep learning, data parallelism is far the most widely used approach for partitioning workloads. \n",
    "It works like this: Assume that we have *k* GPUs. We split the examples in a data batch into *k* parts,\n",
    "and send each part to a different GPUs which then computes the gradient that part of the batch. \n",
    "Finally, we collect the gradients from each of the GPUs and sum them together before updating the weights.\n",
    "\n",
    "The following pseudo-code shows how to train one data batch on *k* GPUs.\n",
    "\n",
    "\n",
    "```\n",
    "def train_batch(data, k):\n",
    "    split data into k parts\n",
    "    for i = 1, ..., k:  # run in parallel\n",
    "        compute grad_i w.r.t. weight_i using data_i on the i-th GPU\n",
    "    grad = grad_1 + ... + grad_k\n",
    "    for i = 1, ..., k:  # run in parallel\n",
    "        copy grad to i-th GPU\n",
    "        update weight_i by using grad\n",
    "```\n",
    "\n",
    "Next we will present how to implement this algorithm from scratch.\n",
    "\n",
    "\n",
    "## Automatic Parallelization\n",
    "\n",
    "We first demonstrate how to run workloads in parallel. \n",
    "Writing parallel code in Python in non-trivial, but fortunately, \n",
    "MXNet is able to automatically parallelize the workloads. \n",
    "Two technologies help to achieve this goal.\n",
    "\n",
    "First, workloads, such as `nd.dot` are pushed into the backend engine for *lazy evaluation*. \n",
    "That is, Python merely pushes the workload `nd.dot` and returns immediately \n",
    "without waiting for the computation to be finished. \n",
    "We keep pushing until the results need to be copied out from MXNet, \n",
    "such as `print(x)` or are converted into numpy by `x.asnumpy()`. \n",
    "At that time, the Python thread is blocked until the results are ready."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "=== workloads are pushed into the backend engine ===\n",
      "0.000778 sec\n",
      "=== workloads are finished ===\n",
      "0.254895 sec\n"
     ]
    }
   ],
   "source": [
    "from mxnet import nd\n",
    "from time import time\n",
    "\n",
    "start = time()\n",
    "x = nd.random_uniform(shape=(2000,2000))\n",
    "y = nd.dot(x, x)\n",
    "print('=== workloads are pushed into the backend engine ===\\n%f sec' % (time() - start))\n",
    "z = y.asnumpy()\n",
    "print('=== workloads are finished ===\\n%f sec' % (time() - start))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Second, MXNet depends on a powerful scheduling algorithm that analyzes the dependecies of the pushed workloads.\n",
    "This scheduler checks to see if two workloads are independent of each other.\n",
    "If they are, then the engine may run them in parallel.\n",
    "If a workload depend on results that have not yet been computed, it will be made to wait until its inputs are ready.\n",
    "\n",
    "For example, if we call three operators:\n",
    "\n",
    "```\n",
    "a = nd.random_uniform(...)\n",
    "b = nd.random_uniform(...)\n",
    "c = a + b\n",
    "```\n",
    "\n",
    "Then the computation for `a` and `b` may run in parallel, \n",
    "while `c` cannot be computed until both `a` and `b` are ready. \n",
    "\n",
    "The following code shows that the engine effectively parallelizes the `dot` operations on two GPUs:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "=== Run on GPU 0 and 1 in sequential ===\n",
      "time: 2.305928 sec\n",
      "=== Run on GPU 0 and 1 in parallel ===\n",
      "time: 1.080638 sec\n"
     ]
    }
   ],
   "source": [
    "from mxnet import gpu\n",
    "\n",
    "def run(x):\n",
    "    \"\"\"push 10 matrix-matrix multiplications\"\"\"\n",
    "    return [nd.dot(x,x) for i in range(10)]\n",
    "\n",
    "def wait(x):\n",
    "    \"\"\"explicitly wait until all results are ready\"\"\"\n",
    "    for y in x:\n",
    "        y.wait_to_read()\n",
    "\n",
    "x0 = nd.random_uniform(shape=(4000, 4000), ctx=gpu(0))\n",
    "x1 = x0.copyto(gpu(1))\n",
    "\n",
    "print('=== Run on GPU 0 and 1 in sequential ===')\n",
    "start = time()\n",
    "wait(run(x0))\n",
    "wait(run(x1))\n",
    "print('time: %f sec' %(time() - start))\n",
    "\n",
    "print('=== Run on GPU 0 and 1 in parallel ===')\n",
    "start = time()\n",
    "y0 = run(x0)\n",
    "y1 = run(x1)\n",
    "wait(y0)\n",
    "wait(y1)\n",
    "print('time: %f sec' %(time() - start))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "=== Run on GPU 0 and then copy results to CPU in sequential ===\n",
      "1.36029291153\n",
      "=== Run and copy in parallel ===\n",
      "1.10839509964\n"
     ]
    }
   ],
   "source": [
    "from mxnet import cpu\n",
    "\n",
    "def copy(x, ctx):\n",
    "    \"\"\"copy data to a device\"\"\"\n",
    "    return [y.copyto(ctx) for y in x]\n",
    "\n",
    "print('=== Run on GPU 0 and then copy results to CPU in sequential ===')\n",
    "start = time()\n",
    "y0 = run(x0)\n",
    "wait(y0)\n",
    "z0 = copy(y0, cpu())\n",
    "wait(z0)\n",
    "print(time() - start)\n",
    "\n",
    "print('=== Run and copy in parallel ===')\n",
    "start = time()\n",
    "y0 = run(x0)\n",
    "z0 = copy(y0, cpu())\n",
    "wait(z0)\n",
    "print(time() - start)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Define model and updater\n",
    "\n",
    "We will use the convolutional neural networks and plain SGD introduced in [cnn-scratch]() as an example workload."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from mxnet import gluon\n",
    "# initialize parameters\n",
    "scale = .01\n",
    "W1 = nd.random_normal(shape=(20,1,3,3))*scale\n",
    "b1 = nd.zeros(shape=20)\n",
    "W2 = nd.random_normal(shape=(50,20,5,5))*scale\n",
    "b2 = nd.zeros(shape=50)\n",
    "W3 = nd.random_normal(shape=(800,128))*scale\n",
    "b3 = nd.zeros(shape=128)\n",
    "W4 = nd.random_normal(shape=(128,10))*scale\n",
    "b4 = nd.zeros(shape=10)\n",
    "params = [W1, b1, W2, b2, W3, b3, W4, b4]\n",
    "\n",
    "# network and loss\n",
    "def lenet(X, params):\n",
    "    # first conv\n",
    "    h1_conv = nd.Convolution(data=X, weight=params[0], bias=params[1], kernel=(3,3), num_filter=20)\n",
    "    h1_activation = nd.relu(h1_conv)\n",
    "    h1 = nd.Pooling(data=h1_activation, pool_type=\"avg\", kernel=(2,2), stride=(2,2))\n",
    "    # second conv\n",
    "    h2_conv = nd.Convolution(data=h1, weight=params[2], bias=params[3], kernel=(5,5), num_filter=50)\n",
    "    h2_activation = nd.relu(h2_conv)\n",
    "    h2 = nd.Pooling(data=h2_activation, pool_type=\"avg\", kernel=(2,2), stride=(2,2))\n",
    "    h2 = nd.flatten(h2)\n",
    "    # first fullc\n",
    "    h3_linear = nd.dot(h2, params[4]) + params[5]\n",
    "    h3 = nd.relu(h3_linear)\n",
    "    # second fullc    \n",
    "    yhat = nd.dot(h3, params[6]) + params[7]    \n",
    "    return yhat\n",
    "\n",
    "loss = gluon.loss.SoftmaxCrossEntropyLoss()\n",
    "\n",
    "# plain SGD\n",
    "def SGD(params, lr):\n",
    "    for p in params:\n",
    "        p[:] = p - lr * p.grad"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Utility functions to synchronize data across GPUs"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The following function copies the parameters into a particular GPU and initializes the gradients."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "=== copy b1 to GPU(0) ===\n",
      "weight = \n",
      "[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.\n",
      "  0.  0.]\n",
      "<NDArray 20 @gpu(0)>\n",
      "grad = \n",
      "[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.\n",
      "  0.  0.]\n",
      "<NDArray 20 @gpu(0)>\n"
     ]
    }
   ],
   "source": [
    "def get_params(params, ctx):\n",
    "    new_params = [p.copyto(ctx) for p in params]\n",
    "    for p in new_params:\n",
    "        p.attach_grad()\n",
    "    return new_params\n",
    "\n",
    "new_params = get_params(params, gpu(0))\n",
    "print('=== copy b1 to GPU(0) ===\\nweight = {}\\ngrad = {}'.format(\n",
    "    new_params[1], new_params[1].grad))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "Given a list of data that spans multiple GPUs, we then define a function to sum the data \n",
    "and broadcast the results to each GPU. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "=== before allreduce ===\n",
      " [\n",
      "[[ 1.  1.]]\n",
      "<NDArray 1x2 @gpu(0)>, \n",
      "[[ 2.  2.]]\n",
      "<NDArray 1x2 @gpu(1)>]\n",
      "\n",
      "=== after allreduce ===\n",
      " [\n",
      "[[ 3.  3.]]\n",
      "<NDArray 1x2 @gpu(0)>, \n",
      "[[ 3.  3.]]\n",
      "<NDArray 1x2 @gpu(1)>]\n"
     ]
    }
   ],
   "source": [
    "def allreduce(data):\n",
    "    # sum on data[0].context, and then broadcast\n",
    "    for i in range(1, len(data)):\n",
    "        data[0][:] += data[i].copyto(data[0].context)\n",
    "    for i in range(1, len(data)):\n",
    "        data[0].copyto(data[i])        \n",
    "\n",
    "data = [nd.ones((1,2), ctx=gpu(i))*(i+1) for i in range(2)]\n",
    "print(\"=== before allreduce ===\\n {}\".format(data))\n",
    "allreduce(data)\n",
    "print(\"\\n=== after allreduce ===\\n {}\".format(data))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Given a data batch, we define a function that splits this batch and copies each part into the corresponding GPU."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "=== original data ===\n",
      "[[  0.   1.   2.   3.]\n",
      " [  4.   5.   6.   7.]\n",
      " [  8.   9.  10.  11.]\n",
      " [ 12.  13.  14.  15.]]\n",
      "<NDArray 4x4 @cpu(0)>\n",
      "\n",
      "=== splitted into [gpu(0), gpu(1)] ===\n",
      "[[ 0.  1.  2.  3.]\n",
      " [ 4.  5.  6.  7.]]\n",
      "<NDArray 2x4 @gpu(0)>\n",
      "\n",
      "[[  8.   9.  10.  11.]\n",
      " [ 12.  13.  14.  15.]]\n",
      "<NDArray 2x4 @gpu(1)>\n"
     ]
    }
   ],
   "source": [
    "def split_and_load(data, ctx):\n",
    "    n, k = data.shape[0], len(ctx)\n",
    "    assert (n//k)*k == n, '# examples is not divided by # devices'\n",
    "    idx = list(range(0, n+1, n//k))\n",
    "    return [data[idx[i]:idx[i+1]].as_in_context(ctx[i]) for i in range(k)]\n",
    "\n",
    "batch = nd.arange(16).reshape((4,4))\n",
    "print('=== original data ==={}'.format(batch))\n",
    "ctx = [gpu(0), gpu(1)]\n",
    "splitted = split_and_load(batch, ctx)\n",
    "print('\\n=== splitted into {} ==={}\\n{}'.format(ctx, splitted[0], splitted[1]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Train and inference one data batch\n",
    "\n",
    "Now we are ready to implement how to train one data batch with data parallelism."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def train_batch(batch, params, ctx, lr):\n",
    "    # split the data batch and load them on GPUs\n",
    "    data = split_and_load(batch.data[0], ctx)\n",
    "    label = split_and_load(batch.label[0], ctx)\n",
    "    # run forward on each GPU\n",
    "    with gluon.autograd.record():\n",
    "        losses = [loss(lenet(X, W), Y) \n",
    "                  for X, Y, W in zip(data, label, params)]\n",
    "    # run backward on each gpu\n",
    "    for l in losses:\n",
    "        l.backward()\n",
    "    # aggregate gradient over GPUs\n",
    "    for i in range(len(params[0])):                \n",
    "        allreduce([params[c][i].grad for c in range(len(ctx))])\n",
    "    # update parameters with SGD on each GPU\n",
    "    for p in params:\n",
    "        SGD(p, lr/batch.data[0].shape[0])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For inference, we simply let it run on the first GPU. We leave a data parallelism implementation as an exercise."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def valid_batch(batch, params, ctx):\n",
    "    data = batch.data[0].as_in_context(ctx[0])\n",
    "    pred = nd.argmax(lenet(data, params[0]), axis=1)\n",
    "    return nd.sum(pred == batch.label[0].as_in_context(ctx[0])).asscalar()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "## Put all things together\n",
    "\n",
    "Define the program that trains and validates the model on MNIST. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "collapsed": true,
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "from mxnet.test_utils import get_mnist\n",
    "from mxnet.io import NDArrayIter\n",
    "\n",
    "def run(num_gpus, batch_size, lr):    \n",
    "    # the list of GPUs will be used\n",
    "    ctx = [gpu(i) for i in range(num_gpus)]\n",
    "    print('Running on {}'.format(ctx))\n",
    "    \n",
    "    # data iterator\n",
    "    mnist = get_mnist()\n",
    "    train_data = NDArrayIter(mnist[\"train_data\"], mnist[\"train_label\"], batch_size)\n",
    "    valid_data = NDArrayIter(mnist[\"test_data\"], mnist[\"test_label\"], batch_size)\n",
    "    print('Batch size is {}'.format(batch_size))\n",
    "    \n",
    "    # copy parameters to all GPUs\n",
    "    dev_params = [get_params(params, c) for c in ctx]\n",
    "    for epoch in range(5):\n",
    "        # train\n",
    "        start = time()\n",
    "        train_data.reset()\n",
    "        for batch in train_data:\n",
    "            train_batch(batch, dev_params, ctx, lr)\n",
    "        nd.waitall()  # wait all computations are finished to benchmark the time\n",
    "        print('Epoch %d, training time = %.1f sec'%(epoch, time()-start))\n",
    "        \n",
    "        # validating\n",
    "        valid_data.reset()\n",
    "        correct, num = 0.0, 0.0\n",
    "        for batch in valid_data:\n",
    "            correct += valid_batch(batch, dev_params, ctx)\n",
    "            num += batch.data[0].shape[0]                \n",
    "        print('         validation accuracy = %.4f'%(correct/num))\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First run on a single GPU with batch size 64."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Running on [gpu(0)]\n",
      "Batch size is 64\n",
      "Epoch 0, training time = 5.3 sec\n",
      "         validation accuracy = 0.9551\n",
      "Epoch 1, training time = 4.9 sec\n",
      "         validation accuracy = 0.9769\n",
      "Epoch 2, training time = 4.9 sec\n",
      "         validation accuracy = 0.9823\n",
      "Epoch 3, training time = 4.9 sec\n",
      "         validation accuracy = 0.9845\n",
      "Epoch 4, training time = 5.0 sec\n",
      "         validation accuracy = 0.9810\n"
     ]
    }
   ],
   "source": [
    "run(1, 64, 0.3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Running on multiple GPUs, we often want to increase the batch size so that each GPU still gets a large enough batch size for good computation performance. A larger batch size sometimes slows down the convergence, we often want to increases the learning rate as well.  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Running on [gpu(0), gpu(1)]\n",
      "Batch size is 128\n",
      "Epoch 0, training time = 3.2 sec\n",
      "         validation accuracy = 0.9255\n",
      "Epoch 1, training time = 3.0 sec\n",
      "         validation accuracy = 0.9656\n",
      "Epoch 2, training time = 3.2 sec\n",
      "         validation accuracy = 0.9762\n",
      "Epoch 3, training time = 3.2 sec\n",
      "         validation accuracy = 0.9790\n",
      "Epoch 4, training time = 3.1 sec\n",
      "         validation accuracy = 0.9831\n"
     ]
    }
   ],
   "source": [
    "run(2, 128, 0.6)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "## Conclusion\n",
    "\n",
    "We have shown how to implement data parallelism on a deep neural network from scratch. Thanks to the auto-parallelism, we only need to write serial codes while the engine is able to parallelize them on multiple GPUs."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Next\n",
    "[Training with multiple GPUs with ``gluon``](../chapter07_distributed-learning/multiple-gpus-gluon.ipynb)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For whinges or inquiries, [open an issue on  GitHub.](https://github.com/zackchase/mxnet-the-straight-dope)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.4.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
